Development and validation of a novel risk classification tool for predicting long length of stay in NICU blood transfusion infants

Newborns are as the primary recipients of blood transfusions. There is a possibility of an association between blood transfusion and unfavorable outcomes. Such complications not only imperil the lives of newborns but also cause long hospitalization. Our objective is to explore the predictor variables that may lead to extended hospital stays in neonatal intensive care unit (NICU) patients who have undergone blood transfusions and develop a predictive nomogram. A retrospective review of 539 neonates who underwent blood transfusion was conducted using median and interquartile ranges to describe their length of stay (LOS). Neonates with LOS above the 75th percentile (P75) were categorized as having a long LOS. The Least Absolute Shrinkage and Selection Operator (LASSO) regression method was employed to screen variables and construct a risk model for long LOS. A multiple logistic regression prediction model was then constructed using the selected variables from the LASSO regression model. The significance of the prediction model was evaluated by calculating the area under the ROC curve (AUC) and assessing the confidence interval around the AUC. The calibration curve is used to further validate the model’s calibration and predictability. The model’s clinical effectiveness was assessed through decision curve analysis. To evaluate the generalizability of the model, fivefold cross-validation was employed. Internal validation of the models was performed using bootstrap validation. Among the 539 infants who received blood transfusions, 398 infants (P75) had a length of stay (LOS) within the normal range of 34 days, according to the interquartile range. However, 141 infants (P75) experienced long LOS beyond the normal range. The predictive model included six variables: gestational age (GA) (< 28 weeks), birth weight (BW) (< 1000 g), type of respiratory support, umbilical venous catheter (UVC), sepsis, and resuscitation frequency. The area under the receiver operating characteristic (ROC) curve (AUC) for the training set was 0.851 (95% CI 0.805–0.891), and for the validation set, it was 0.859 (95% CI 0.789–0.920). Fivefold cross-validation indicates that the model has good generalization ability. The calibration curve demonstrated a strong correlation between the predicted risk and the observed actual risk, indicating good consistency. When the intervention threshold was set at 2%, the decision curve analysis indicated that the model had greater clinical utility. The results of our study have led to the development of a novel nomogram that can assist clinicians in predicting the probability of long hospitalization in blood transfused infants with reasonable accuracy. Our findings indicate that GA (< 28 weeks), BW(< 1000 g), type of respiratory support, UVC, sepsis, and resuscitation frequency are associated with a higher likelihood of extended hospital stays among newborns who have received blood transfusions.

www.nature.com/scientificreports/Providing red blood cell (RBC) transfusions can increase the capacity of tissues to carry oxygen, thereby reducing the likelihood of apnoeic episodes and promoting weight gain and growth among premature newborns 4 .
Nevertheless, an increasing body of evidence suggests a link between transfusion exposure and negative outcomes.The existing studies indicate that blood transfusion is not only closely related to the mortality of children, but also may be related to the occurrence of premature complications such as intraventricular hemorrhage (IVH), bronchopulmonary dysplasia (BPD), necrotizing enterocolitis (NEC) [5][6][7] .These complications not only threaten the life of newborns, but also lead to long hospitalization.Some studies showed that transfusion infants have longer hospital or NICU length of stay than non-transfusion group 8,9 .The long hospitalization exposes them to medical environments for a longer duration, thereby increasing the risk of hospital-acquired infections and other complications 10 .Additionally, it places a heavy burden on neonatal care.Moreover, an extended LOS may hinder the establishment of parental bonding and interaction with the newborn, potentially causing significant emotional and financial stress on the family 11 .Therefore, the long LOS of these patients in the neonatal intensive care unit has become a worrisome issue, making accurate prediction of LOS in the neonatal ward increasingly crucial.Furthermore, there is currently limited research on LOS specifically focused on transfusion patients.
Indeed, our objective is to utilize data from the NICU to explore the predictor variables that may lead to extended hospital stays in NICU patients who have undergone blood transfusions and develop a predictive nomogram, in order to provide more evidence for the prevention of long hospital stays and the optimization of resource allocation for NICU patients.

Methods
This study aims to conduct a retrospective investigation on the data of registered newborns at the First Affiliated Hospital of Xinjiang Medical University.The research protocol received approval from the Ethics Committee of the First Affiliated Hospital of Xinjiang Medical University in Urumqi.Given the retrospective nature of the study, the necessity for a written informed consent form has been waived.

Study population
Between May 1, 2021, and May 1, 2023, neonates who received blood transfusions at the First Affiliated Hospital of Xinjiang Medical University were included in this study.The following criteria were applied for inclusion in the study: 1. Neonates who received at least one blood transfusion during their hospitalization; 2. Admission age less than 1 day and 3. Admitted to the NICU rather than general wards.The following exclusion criterion was applied in this study: 1. Rapid discharge or non-prescription discharge for non-medical reasons; 2. Neonatal death during NICU hospitalization or before admission; 3. Chromosomal abnormalities or severe congenital malformations; 4. Neonates undergoing surgery and 5. Patients with missing data exceeding 10% in general conditions, complications, and laboratory measurements are excluded.

Definitions
We defined long LOS as exceeding the 75th percentile of LOS 13,14 .The original continuous variables of gestational age and weight did not show significant differences (p > 0.05).Therefore, we explored various combinations, including different categories for weight and gestational age classifications.In the end, we found that the classification method using gestational age (< 28 weeks, ≥ 28 weeks) and weight (< 1000 g, ≥ 1000 g) yielded more significant differences for our study.

Development and assessment of the nomogram
In this study, the subjects were randomly split into groups, with two-thirds of the subjects assigned to the training set, which was used to identify the predictor variables associated with long hospitalization in newborns who received blood transfusions and to develop a prediction scoring model.The remaining one-third of the subjects were designated as the validation set, which was used to evaluate the effectiveness of the prediction scoring system.This study utilized the least absolute shrinkage and selection operator (LASSO) method to identify the optimal predictive variables among the predictor variables present in newborns who received blood transfusions 15 .LASSO regression, which incorporates L1 regularization, is particularly well-suited for datasets with highly correlated predictor variables.By introducing a penalty term, LASSO regression tends to select a subset of variables while shrinking the coefficients of other highly correlated variables towards zero, effectively mitigating the impact of multicollinearity.Therefore, in our study, we employed LASSO regression to handle the presence of highly correlated predictors and to provide more stable and reliable estimates.The LASSO regression model was employed to select variables that had nonzero coefficients, which were then used in a multivariable logistic regression analysis to construct a predictive model for long LOS.A predicting model nomogram was created by incorporating all the potential predictors selected in the LASSO regression model.Moreover, the diagnostic performance of the visual prediction model was externally validated using the Hosmer-Lemeshow test and coefficient of determination (R2) to evaluate its goodness of fit.The model's predictive accuracy and conformity were also assessed examining the shape of the ROC and calibration curves and by using metrics such as the area under the ROC curve (AUC).Additionally, the decision curve analysis (DCA) was utilized to assess the net benefit of the model for patients.The discrimination and calibration of the model were checked through bootstrapping with 1000 resamples.Finally, to enhance the credibility and accuracy of our model evaluation, we utilized a technique called fivefold cross-validation.This method involves dividing our dataset into five subsets of approximately equal size.During the evaluation process, we iteratively trained our model on four of these subsets while using the remaining subset as a validation set.This allowed us to assess the model's performance across multiple iterations, each time using a different subset for validation.

Statistical analysis
Statistical analysis was conducted using R software, Version 4.1.3(available at https:// www.Rproj ect.org).Categorical data are presented as numbers and percentages, while continuous variables are reported as mean ± standard deviation (SD) if they follow a normal distribution, or as median (interquartile range [IQR]) if they do not.To assess proportions, the χ2 test or Fisher's exact test was used for comparing categorical variables.For continuous variables that exhibited a normal distribution, independent group t-tests were employed to compare means.All statistical tests were performed in a two-sided manner, and p-values ≤ 0.05 were considered statistically significant.

Ethics approval and consent to participate
This study followed the Helsinki Declaration and was approved by the Ethics Committee board of Xinjiang Medical University Affiliated First Hospital.Due to the retrospective nature of the study, the need of informed consent was waived by the Ethics Committee board of Xinjiang Medical University Affiliated First Hospital.All methods were carried out in accordance with relevant guidelines and regulations.No biological specimens were used in this study.

Baseline characteristics of included neonates
The study analyzed a total of 539 infants, among whom 398 had hospital stays shorter than the 75th percentile (normal LOS), while 141 infants had hospital stays longer than the 75th percentile (long LOS).(Fig. 1) The baseline characteristics of these two groups are presented in Table 1.Compared with children in the normal LOS group, a significantly higher proportion of children in the long LOS group had a gestational age of less than 28 weeks (19.9% vs. 1.76%; p < 0.001) and weight less than 1000 g (29.8% vs. 2.01%; p < 0.001).A statistically significant difference (p < 0.05) was observed between the two groups in terms of Apgar score, feeding patterns, respiratory support, UVC, RDS, NEC, pneumorrhagia, sepsis, rescue frequency (≥ 3 times), HCO3, ALB, CK, and urea, as illustrated in Table 1.

Variables selection
Based on the data from the training set, we conducted LASSO regression analysis to identify independent predictor variables that significantly affect long LOS.The LASSO analysis yielded a reduction of the initial 57 perinatal variables down to six potential predictors, resulting in a ratio of 9.5:1 (Fig. 2A,B).The six potential predictors identified through the LASSO analysis were GA (< 28 weeks), BW(< 1000 g), respiratory support type, umbilical venous catheter (UVC) use, sepsis, and rescue frequency (≥ 3 times) (Fig. 2C).

Risk prediction nomogram development
A logistic regression model was constructed using the six predictor variables identified by LASSO: GA (< 28 weeks), BW(< 1000 g), respiratory support type UVC use, sepsis, and rescue frequency (≥ 3 times).Table 2 presents the logistic regression model, displaying the coefficients and corresponding p-values for each of the six predictor variables.These coefficients indicate the strength and direction of the association between the predictor variables and the outcome (long LOS).As shown in Fig. 3, Each blood transfusion patient's risk of an extended hospital stay for can be estimated by evaluating the cumulative points assigned on the nomogram.A higher total score indicates a greater likelihood of long hospitalization.Previous research has identified specific clinical features as predictor variables for long LOS.To enhance our model, we incorporated additional features and evaluated their discriminatory ability (Table 3).However, the results indicated that adding these predictor variables to the validation set did not lead to significant improvements, and may have led to models that were overfitted.Consequently, we have decided to use the nomogram as our final model.The R2 of our model was 0.331.Furthermore, we conducted an assessment using fivefold cross-validation to evaluate the generalizability of our model.The results, depicted in Fig. 6, demonstrate satisfactory performance.

Clinical effectiveness of the model
Figure 7 displays the decision curve analysis for the long LOS nomogram for children undergoing blood transfusion.This analysis reveals that the model is relevant across a wide range of risk thresholds.

Discussion
The LOS in the NICU has been a focal point of research.There exists a correlation between long LOS and hospital-acquired conditions, as well as adverse events in healthcare 16,17 .Blood transfusion is a frequently performed procedure for neonates who require intense care, especially for preterm neonates.However, there is limited literature available on predicting LOS specifically for this high-risk patient population.
Early identification of long LOS-NICU risk in NICU neonatal patients who received transfusion therapy is not only crucial for providing important counseling references to families but may also guide decisions on optimal clinical interventions.Thus, in this study, we utilized historical clinical data from the NICU to identify important predictor variables and developed a predictive model for long NICU LOS in neonates receiving blood transfusions.We deployed routinely used machine learning algorithm (LASSO) to selection the predictor variables related to aforementioned issue.We identified six independent characteristics in our study: GA (< 28 weeks), BW(< 1000 g), Respiratory support type, UVC, sepsis and rescue frequency (≥ 3 times).Our model demonstrated  The GA and BW are frequently used to evaluate newborn infants.We found that GA lower than 28 weeks and BW less than 1000 g dramatically increased the probability of long LOS.Our findings are consistent with previous research indicating that birth weight and gestational age are the primary predictor variables influencing long length of stay in the NICU 18,19 .The incidence of anemia is high among premature infants.This is due to several factors, including their smaller circulating blood volume, shorter lifespan of red blood cells, and an immature bone marrow response to anemia.The immature hepatic receptors in premature infants are relatively insensitive to tissue hypoxia, and their plasma erythropoietin (EPO) levels are also low 20 .The degree of deficiency is especially significant in the smallest and least mature infants 21 .Another crucial factor is that premature infants require careful monitoring of various parameters, which often involves repeated blood sampling for laboratory analysis.During the initial hospitalization period, premature infants may require a greater number of red blood cell transfusions compared to full-term infants in order to elevate hemoglobin levels and improve blood oxygenation capacity 22 .
In this study, blood-transfused NICU who required respiratory support such as mechanical ventilation during hospitalization generally had longer hospital stays compared to those who only needed supplemental oxygen or do not require respiratory support.This finding is similar to previous research 23 .This means that there is a strong and consistent correlation between the need for invasive respiratory support and the length of hospital stay in  transfusion-dependent infants.This may be attributed to the fact that patients who require invasive respiratory support may have more severe conditions, such as serious illnesses or injuries, that necessitate ongoing supportive treatments including blood transfusions.Transfusions can help improve oxygenation levels by providing sufficient hemoglobin and oxygen transport 23,24 .On the other hand, they may face a higher risk of complications such as infections 25 , which can further prolong hospitalization.Moreover, patients requiring mechanical ventilation may take longer to wean off the ventilator and may require respiratory physiotherapy after extubating to promote lung function recovery.Therefore, the need for invasive respiratory support is an important predictor variable in predicting the length of hospital stay in transfusion-dependent patients.Healthcare professionals should consider this when developing patient care plans and allocating hospital resources.Appropriate management of invasive respiratory support may help shorten the length of hospital stay and potentially improve patient prognosis.www.nature.com/scientificreports/UVC is a common invasive procedure often used in neonates 26 .This procedure involves inserting a catheter into the neonate's umbilical vein for purposes such as blood transfusion, fluid administration, medication delivery, or monitoring hemodynamics 27 .While umbilical vein catheterization is considered effective and safe, it is important to note that the UVC is associated with longer duration of hospitalization.The procedure involves local anesthesia, manipulation, and fixation of neonates, which may cause discomfort or complications such as infection and bleeding [28][29][30] .As a result, hospitalization time for neonates requiring blood transfusion may be extended.On the other hand, in some cases of severely ill neonates, umbilical vein catheterization may be necessary for ongoing treatment or monitoring.In such situations, the presence of the umbilical vein catheter extends the duration of hospitalization for the child, requiring longer periods of observation and treatment for transfusion-dependent neonates.
The occurrence of nosocomial infection in hospitalized newborns is prevalent and carries significant consequences.It is widely recognized as one of the most frequently encountered adverse events during their hospitalization 31 .The immune system of newborns is relatively weak, with poor resistance, making them vulnerable to various pathogens.Infections can lead to severe complications and even endanger lives 32,33 .In our study, predictor variables contributing to long LOS in transfused children included sepsis.Previous studies have also confirmed that sepsis is a risk factor for long hospitalization 10,34,35 .Sepsis may require long-term symptom   management, antimicrobial therapy, shock management, nutritional support, and carry a high risk of complications, thereby extending treatment time and LOS.Research has shown that the median LOS for infected newborns is twice that of uninfected newborns 19 .This indicates that infections do have a significant impact on the length of hospital stay for newborns.The occurrence of sepsis in transfused children may necessitate additional treatment and rehabilitation processes, also increasing medical costs and family burden.In our study, we have identified an important predictor variable affecting the length of hospital stay for newborns, which is the number of resuscitation attempts.This finding, which has not been previously mentioned in existing research, may be attributed to variations in healthcare policies across different regions.Newborns requiring multiple resuscitation attempts, particularly those in need of blood transfusions, often have severe conditions with multiple organ dysfunctions or systemic diseases.These infants require long medical monitoring, treatment, and rehabilitation to stabilize their condition.Additionally, frequent resuscitation attempts can induce fatigue and physical stress in newborns.To ensure their safety and stability, healthcare teams typically extend the duration of hospital observation, providing necessary rehabilitation and monitoring to prevent further deterioration.
Our research provides healthcare professionals with a visual predictive tool for identifying transfused infants at higher risk of long LOS.This allows clinicians to differentiate infants with a higher risk of long LOS, enabling them to plan general resource allocation accordingly.By identifying high-risk patients in advance, clinicians can plan for sufficient beds, equipment, and staff in the neonatal intensive care unit.The LOS prediction model  also identifies potentially modifiable predictor variables that are associated with long hospital stays in transfused infants.The impact of modifying care to optimize these predictor variables could be studied in future research.

Limitations
The present study has several limitations that should be acknowledged.First, the discharge standards across hospitals may differ, which may confound the results.Second, although our model's internal validation demonstrated excellent calibration and discrimination, external validation is still required using additional datasets to confirm its reliability.Third, as our study was conducted at the largest children's medical center in the region, there may be a selection bias towards critically ill premature infants.Thus, generalizing our findings to other healthcare settings with different patient populations should be done with caution.Furthermore, to establish the robustness and reliability of our results, prospective studies conducted in multicenter clinical trials are needed.Finally, future research could explore including additional predictive variables to improve the model's performance and predictive capabilities.

Conclusions
The present study introduced a novel nomogram that demonstrates satisfactory accuracy in assisting clinicians to assess the risk of long LOS in infants undergoing blood transfusion.GA (< 28 weeks), BW(< 1000 g), respiratory support type, UVC, sepsis and rescue frequency (≥ 3 times) were related to an increased risk of long length of NICU stay in infants receiving blood transfusion.This model can help healthcare professionals stratify the risk level of long hospital stay for children undergoing blood transfusion in NICU, conduct appropriate clinical interventions, and effectively allocate medical resources.
Figure 7. Decision curve analysis of nomograms.Using the nomogram to predict long length of stay is the optimal decision-making strategy for maximizing net benefit, especially when compared to scenarios where no prediction model is utilized (i.e., treat-all or treat-none scheme) across a majority of given threshold probabilities (> 2%).
14:6877 | https://doi.org/10.1038/s41598-024-57502-3www.nature.com/scientificreports/Evaluation of the performance of the predictive model The differentiation capacity of the developed model was validated in both the training set and validation set.The AUC for the nomogram in the training set was 0.851 (95% CI 0.805-0.891),as shown in Fig. 4A.Similarly, in the validation set, the AUC was 0.859 (95% CI 0.789-0.920),as depicted in Fig. 4B.These results indicate that the model exhibited good discriminatory and predictive abilities.The calibration curve demonstrated that the model exhibited an excellent ability to accurately predict actual probabilities, as shown in Fig. 5A,B.Based on 1000 rounds of resampling, the mean absolute error (MAE) obtained for the training set calibration curve was 0.024 with a sample size of 154.Similarly, for the validation set calibration curve, the MAE achieved through 1000 rounds of resampling was 0.01 with a sample size of 385.The Hosmer-Lemeshow test indicated no significant difference between our model and the observed values (p > 0.05).

Figure 2 .
Figure 2. Feature selection.(A) Variable selection using LASSO logistic regression model.The dashed line on the left represents the minimum criterion, and the 1-SE of the minimum criterion is used to determine the optimal parameter (lambda) selection in the LASSO model (represented by the dashed line on the right).(B) Silhouette of LASSO coefficients for 57 features.(C) Features with non-zero coefficients selected by LASSO.;

Table 3 .
Different model performance.Auc area under the curve, Sen sensitivity, Spe specificity, Acc accuracy, Pre precision, Ppv positive predictive value, Npv negative predictive value.M1a, M2a, and M3a represent the apparent performance of the models fitted using the training set, while M1b, M2b, and M3b correspond to the performance of the models evaluated using the validation set.*P means the Delong test that compare the AUC value of different models.M1 represents the final model illustrated in the nomogram.M2 introduces RDS, NEC, and pneumorrhagia; M3 introduces RDS, NEC, pneumorrhagia, 1-min and 5-min Apgar scores, HDP, and ALB.

Figure 5 .
Figure 5. Calibration curves.(A) Assessing agreement between predicted probabilities of long LOS and observed outcomes within the training set.(B) Assessing agreement between predicted probabilities of long LOS and observed outcomes within the validation set.X-axis: Predicted probabilities; Y-axis: Observed proportions of long LOS.Deviations from the 'Ideal' line indicate potential errors.Points aligning with the 'Ideal' line indicate good calibration.The ' Apparent' estimate is uncorrected and biased.'Bias-corrected' estimate improves accuracy.'Ideal' estimate serves as a benchmark.Tick marks show percentiles.Mean Absolute Error (MAE) measures overall accuracy.Sample size (n) reported for each estimate.

Figure 6 .
Figure 6.Fivefold cross-validation.Fivefold cross-validation involves dividing the dataset into 5 equal parts, where 4 parts are used for training and 1 part is used for validation.This process is repeated 5 times, with each different subset serving as the validation set.The results are then averaged to provide an evaluation metric for the model's performance.

Table 2 .
Predictive factors for long LOS in infants undergoing blood transfusion.GA gestational age, BW birth weight, UVC umbilical venous catheterization, OR 1 Odds Ratio, CI 1 Confidence Interval.Figure 3. A nomogram predicting the risk of long Length of stay in infant with blood transfusions.The long LOS risk nomogram was developed in the cohort, with GA, BW, Respiratory support type, UVC, sepsis and rescue frequency.